Precise N-Gram Probabilities from Stochastic Context-Free Grammars

نویسندگان

  • Andreas Stolcke
  • Jonathan Segal
چکیده

We present an algorithm for computing n-gram probabilities from stochastic context-free grammars, a procedure that can alleviate some of the standard problems associated with n-grams (estimation from sparse data, lack of linguistic structure, among others). The method operates via the computation of substring expectations, which in turn is accomplished by solving systems of linear equations derived from the grammar. The procedure is fully implemented and has proved viable and useful in practice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

-gram Probabilities from Stochastic Context-free Grammars

We present an algorithm for computing n-gram probabilities from stochastic context-free grammars, a procedure that can alleviate some of the standard problems associated with n-grams (estimation from sparse data, lack of linguistic structure, among others). The method operates via the computation of substring expectations, which in turn is accomplished by solving systems of linear equations der...

متن کامل

Combination Of N-Grams And Stochastic Context-Free Grammars For Language Modeling

This paper describes a hybrid proposal to combine n-grams and Stochastic Context-Free Grammars (SCFGs) for language modeling. A classical n-gram model is used to capture the local relations between words, while a stochastic grammatical model is considered to represent the long-term relations between syntactical structures. In order to de ne this grammatical model, which will be used on large-vo...

متن کامل

A Hybrid Language Model based on Stochastic Context-free Grammars

This paper explores the use of initial Stochastic Context-Free Grammars (SCFG) obtained from a treebank corpus for the learning of SCFG by means of estimation algorithms. A hybrid language model is defined as a combination of a word-based n-gram, which is used to capture the local relations between words, and a category-based SCFG with a word distribution into categories, which is defined to re...

متن کامل

Bayesian Learning of Probabilistic Language Models

The general topic of this thesis is the probabilistic modeling of language, in particular natural language. In probabilistic language modeling, one characterizes the strings of phonemes, words, etc. of a certain domain in terms of a probability distribution over all possible strings within the domain. Probabilistic language modeling has been applied to a wide range of problems in recent years, ...

متن کامل

Stochastic Tree-Adjoining Grammars

A B S T R A C T The notion of stochastic lexicalized tree-adjoining grammar (SLTAG) is defined and basic algorithms for SLTAG are designed. The parameters of a SLTAG correspond to the probability of combining two structures each one associated with a word. The characteristics of SLTAG are unique and novel since it is lexically sensitive (as N-gram models or Hidden Markov Models) and yet hierarc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994